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Abstract: In this paper, we study the following model of hidden Markov 
chain: Yi = X{ + £i, i = 1, • ■ ■ , n + 1 with (Xi) a real-valued stationary 
Markov chain and (£i)i<i< n +i a noise having a known distribution and 
independent of the sequence (Xi). We present an estimator of the transi- 
tion density obtained by minimization of an original contrast that takes 
advantage of the regressive aspect of the problem. It is selected among 
a collection of projection estimators with a model selection method. The 
L 2 -risk and its rate of convergence are evaluated for ordinary smooth noise 
and some simulations illustrate the method. We obtain uniform risk bounds 
over classes of Besov balls. In addition our estimation procedure requires 
no prior knowledge of the regularity of the true transition. Finally, our 
estimator permits to avoid the drawbacks of quotient estimators. 
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1. Introduction 

In this paper we consider the following additive hidden Markov model: 



with (Xi)i>i a real-valued Markov chain, (£j)j>i a sequence of independent and 
identically distributed variables and 



Only the variables Y±, Y n +\ are observed. Besides its initial distribution, the 
chain {Xi)i>\ is characterized by its transition, i.e. the distribution of Xi+x 
given Xi. We assume that this transition has a density II, defined by II(:e, y)dy — 
P{Xi + \ € dy\Xi = x), and our aim is to estimate this transition density II. 



Yi — Xi + £i 



i = 1, . . . , n + 1 



(1) 




(2) 
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This model belongs to the class of hidden Markov models. The Hidden 
Markov Models constitute a very famous class of discrete-time stochastic pro- 
cesses, with many applications in various areas such as biology, sp eech recogni- 



tion o r finance. For a general reference on these models, we refer to lCappe et al 



( 20051 ). Here, we study a simple model of HMM where the noise is additive 
(which allows dealing also with multiplicative noise by use of a logarithm). In 
standard HMM, it is assumed that the joint density of (JQ, Yi) has a parametric 
form and the aim is then to infer the parameter from the observations Yi, Y n , 
gen erally by m aximizin g the l ikelihoo d . For t his type of st u dy, se e, among oth- 
ers. iBaum and Petrid (Il966|) iLerouxl dl992l). iBakrv et all (Il997t). iBickel et al 
' l998h . ljensen and Petersen! (|l999h . IPouc et all (|2004h . iFuhl (|2006h . 



This model is also similar to the so-called convolution model (for which the 
aim is to estimate the density of (Xj)j>i). As in that model, we use the Fourier 
transform extensively. The restrictions on error distribution and rate of con- 
vergenc e obtained for o u r est i mator are also of th e same kind. Related works 
include IStefanskil rtl990h. iFanl (Il993h. iMasrvl dl993l) (for the multivariate case), 
Penskv and Vidakovid (|l999l ). IComte et alT(|2006h . 

The estimat i on of the transition density of a hidden Markov chain is studied 
by IClemencon ( 2003 ). His estimator is based on the thresholding of a wavelet- 
vaguelette decomposition. The drawback is that t his estimator do es not achieve 
the minimax rate because of a logarithmic loss. Lacour ( 2007bl ) describes an 
estimation procedure by quotient of an estimator of the joint density and an 
estimator of the stationary density /. The minimax rate is reached by this esti- 
mator if it is assumed that / and f.Yl ha ve the regularity a . But this smoothness 
condition on / raises a problem. Indeed Clemencon ( 2000h gives an example in 
which the stationary density / is not continuous, whereas the transition density 
n is constant. It shows that / can be much less regular than n. Therefore, our 
aim is to find an estimator of the transition density which docs not have the 
above mentioned disadvantages. 

To estimate n, we use an original contrast inspired by the mean square con- 
trast. The first idea is to connect our problem with the regression model. For 
any function g, we can write 



n{.,y)g(y)dy (X 4 ) + m+1 



where ry^+i = g(Xi + i) — E[g(Xj_|_i)|Xj]. Then, for all function <?, we can con- 
sider J Tig as a regression function. The mean square contrast to estimate 
this regression function, if the Xj were known, should be (1/n) Y^i=i[t 2 — 
2t(Xi)g(Xi + i)]. If J g 2 = 1, this contrast can be written 



n „ 

(1/n) X)[ J T 2 (X Z , y)dy - 2T{X U X i+1 ) 



by setting T(x,y) = t(x) q(y) i.e. T such that / T(x,y)g(y)dy — t(x). It is this 
contrast which is used in lLacour ( 2007a ) but in our case, only the Yi, . . . , Y«+i 



C. Lacour / Estimation of the transition of a hidden Markov chain 



3 



are known. Therefore we introduce in this paper two operators Q and V such 
thatE[Q T2 (y i )|^] - jT 2 (X i ,y)dyandE[V T (Y i ,Y i+1 )\X i ,X i+1 ]=T(X i ,X i+1 ). 
It leads to the following contrast: 

1 " 

7n(T) = -Y\Q T -{Y l ) - 2V T (Y l ,Y l+1 )}. (3) 
»=l 

A collection of estimators is then defined by minimization of this contrast on 
wavelet spaces. Indeed wavelets have many useful properties and in particular 
they can have a compact support and can be regular enough to bal a nce t he 
smoothness of the noise. A general reference on the subject is Meyer ( 199C)h 's 
book. 



A method of model selection inspired by iBarron et al.l ( 19991 ) and based on 



contrast © is used to build our final estimator. A data driven choice of model 
is performed via the minimization of a penalized criterion. The chosen model is 
the one which minimizes the empirical risk added to a penalty function. In most 
cases when estimating mixing processes, a mixing term appears in this penalty. 
In the same way, some unknown terms derived from the dependence between 
the Xj appears i n the thresholding constant used to define the estimator of 



jp ears n 

Clemenconl(|2003l) . Here a conditioning argument enables to avoid such a mixing 



term in the penalty. Our penalty contains only known quantities or terms that 
can be estimated and is then computable. 

For an ordinary smooth noise with regularity 7, the rate of convergence 
n -a/(2a+47+2) j g obtained jf it i s assumed that the transition II belongs to 
a Besov space w ith regularity a. Our estimator is then better than that of 
Clemencon ( 20031) which achieves only the rate (\a.{n) / n) a ^ 2a+il+2 \ Moreover 



this rate is obtained without assuming the regularity a of II known. 

This paper is organized as follows. In Section[2]we present the model and the 
assumptions. Section [3] is devoted to the definitions of the contrast and of the 
estimator. The main result and a sketch of proof are to be found in Section 2J 
Numerical illustrations through simulated examples are reported in Section [5) 
The detailed proofs are gathered in Section [6l 



2. Study framework 
2. 1 . Notations 

For the sake of clarity, we use lowercase letters for dimension 1 and capital 
letters for dimension 2. For a function t : K 1— > R, we denote by p|| the L 2 norm 
that is ||i|| 2 = J K t 2 (x)dx. The Fourier transform t* of t is defined by 

.»-/.-.(,)*. 

Notice that the function t is the inverse Fourier transform of t* and can be 
written t(x) = l/(2n)J e txu t*(u)du. The convolution product is defined by 
(t * s)(x) = jt(x- y)s(y)dy. 
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In the same way, for a function T : M 2 R, ||T|| 2 = JJ R2 T 2 (x, y)dxdy and 
T*(u,v) = JJ e- lxu - iyv T{x,y)dxdy. 

We denote by t ® s the function: (x, y) i— ► (t <g> s)(x, y) = t(x)s(y). 

We will estimate II on a compact set A = Ai x A 2 only and we denote by 
|| . 11^4 the norm in L 2 (A) i.e. 



\T\\\ = JJ^T 2 (x,y)dxdy. 



2.2. Assumptions on noise 

The Markov chain (Xj)j>i is observed through a noise sequence (£i)i>i of in- 
dependent and identically distributed random variables. The density of Si is 
denoted by q and is assumed to be known. We assume that the Fourier trans- 
form of q never vanishes and that q is ordinary smooth. More precisely the 
assumption on the error density is the following: 

HI q is uniformly bounded and there exist 7 > and ko > such that Vi £ 1 
\q*(x)\ >k (x 2 + l)-^ 2 . 

This assumption restrains the regularity class of the noise. Among the so-called 
ordinary smooth noises, we can cite the Laplace distribution, the exponential 
distribution and all the Gamma or symmetric Gamma distributions. The noise 
follows a Gamma distribution with scale parameter A and shape parameter £ if 
q(x) = \ ( 'x ( '~ 1 e~ Xx /r(£) for x > with T the classic Gamma function. Then 

-C/2 



\q*(x)\ = (l + ^ 

So q is bounded and verifies HI with 7 = (. The case ( — 1 corresponds to 
an exponential distribution and if A = 1/2, ( = p/2, it is a chi-square x{p)- A 
Laplace noise is defined in the following way 

q(x) = ^- e -^ x -^ and \q* (x)\ 



2 ^ y n x 2 + X 2 

Then HI is satisfied with 7 = 2. More generally, we can define the symmetric 
gamma distribution with density q(x) = A c |a;| f - 1 e- A l x l/(2r(C)). The character- 
istic function is then 



q* (x) = 1 + — cos 2( arctan 



\ 2 ) V \A + yjx 2 + A 2 



so that HI is verified with 7 = ( + 1 if ( is an odd integer and 7 = ( otherwise. 
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Remark 1. We have to point out that the Gaussian noise does not verify As- 
sumption HI. Indeed, an exponential decrease of the Fourier transform of the 
error density is more difficult to control and a supersmooth noi se makes denois - 
inq more difficult. F or t hat reason, many au t hors, a mong which \Butuced l200l ). 
\Koo and Led \l99c\ ) or I Youndie and Wellk \200A ). have considered only ordi- 
nary smooth noise. The method used in this paper does not allow dealing with 
supersmooth noise. Indeed, it requires a wavelet basis more regular than the noise 
and with compact support (because of Assumption H4 below), which is impossible 
when the noise is supersmooth. 



2.3. Assumptions on the chain 

The hypotheses on the hidden Markov chain (Xi)i>i are the following: 

H2 The chain is irreducible, positive recurrent and stationary with unknown 
density /. 

H3 There exists a positive real /o such that, for all x in A\, 

fo < f(x) < H/llooA < 00 

H4 The transition density II is bounded on A by ||TI|| oo : a < 00. 
H5 The process (X/-) is geometrically j3- mixing ((3 q < e~ 6q ), or arithmetically 
/3-mixing (/3 q < q~ e ) with 9 > 8 where 



(3 q = J \\P"{x,.)-a\\ TV f{x)dx 

with P q {x, .) the distribution of Xi +q given Xj, = x, u the stationary 
distribution and H-Htv the total variation distance. 



We refer to Doukhanl (1994) for details on the /3-mixing. Assumption H5 



implies that the process is /3-mixing, with /3-mixing coefficients smaller 
than those of (Xk). Assumption H3 is common (but restrictive) and is crucial 
to control the empirical processes brought into play. A lot of processes verify 
Assumptions H2-H5, as autore gressive process es, diffusions or ARCH processes. 



These examples are detailed in iLacourl (|2007al ). 



3. Estimation procedure 
3.1. Projection spaces 

Here we describe the projection that we use to estimate the transition R We 
will consider an increasing sequence of spaces, indexed by m, to construct a 
collection of estimators. For the sake of simplicity, we assume that A — [0, l] 2 . 
We use a compactly supported wavelet basis on the interval [0, 1], described 



Cohen et al.l (|1993f ). The construction provides a set of functions {4>k) for 



k = 0, . . . , 2 J — 1 with J a fixed level, and for all j > J a set of functions 
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(ipjk),k — 0, ...,2 J — 1. The collection of these functions forms a complete 
orthonormal system on [0, 1]. Then , for u in L 2 {[0, 1]), we can write 



2 J -1 2 J -1 
k=Q j>J k=0 



Actually 



'2 J / 2 (j>°(2 J x-k) ]£k = 0,...,N-l 
<j>k(x) = { 2 J / 2 (f>(2 J x — k) if fc = N, . . . , 2 J - N - 1 
_ 2 J / 2 4> 1 (2 J x -k) if k = 2 J - N, . . . , 2 J - 1 

where is a Daubechies father wavelet wi th support [— N + 1 , N] and 



are edge wavelets explicitly constructed in ICohen et al. (1993). The functions 



cj) k have support [(k - N + 1)/2 J , (k + N)/2 J ] n [0, 1]. For r a positive real, N is 
chosen large enough so that 4> has regularity r (in the sense defined in (j4])): this 
is possible since it is a property of the Daubechies wavelets that the smoothness 
of <p increases linearly with N. We choose J such that 2 J > 2N so that the two 
edges do not interact (no overlap between <f>° and cf) 1 ) . The construction ensures 
that <fi° and (j) 1 are also of regularity r. In the same way, for each level j, the 
ipjk are dilatation and translation of functions ip, ip and ip 1 with regularity r. 

Now we constru c t a w avelet basis of L 2 ([0,1] 2 ) by the tensorial product 
method (see Meyer (199(3) Chapter 3 Section 3). The father wavelet is 



and the mother wavelets are (p <8> ip, ip (£> cf>, ip ® ip. A function T in L 2 ([0, l] 2 ) 
can then be written 

2 J -12 J -1 2^-1 2 j -l 

k=Q 1=0 j>J k=0 1=0 

For the sake of simplicity, we adopt the following notation 

T (x,y) = ^2 a 3 ki^jk{x)ip 3 i{y). 
3>J (Ki)eA 5 

where (pjk = 2^ 2 Lp(2?x — k) with <p = (p,(f>° ,(p x , -0° or ip 1 according to the 
values of j and k. For j > J, Aj is a set with cardinal 3.2 2: > and Aj is a set 
with cardinal 2 2J . In the rest of this paper we will use the following property of 
ip deriving from the regularity of the initial Daubechies wavelet: there exists a 
positive constant k^ such that 

Vnet \<p*(u)\ < k 3 (u 2 + l)- r/2 (4) 

Now , for m > J, we can consider the space 

m 

§m = {T : R 2 -> K, T(x,y) = ^2 Y a 3kiV]k(x)ifji(y)} 

j=J (fe,i)£Aj 
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Note that the functions in § m are all supported in the interval [0, l] 2 . The 
dimension of the space S m is D 2 n = 2 2J + 3J2"Lj+i ^ e [2 2m ,2 2m+2 }. We 
denote by S the space S mo with the greatest dimension D 2 rao = V 2 smaller than 
7jV(*7+ 2 ), j^. j g maximal space that we consider. The spaces S m have the 
following properties: 

PI m! < m =£■ § m / c S m 

P2 II T, jk i a jkmk ® <pjif = Ejh a %i- 

This property derives from the orthonormality of the basis. 

Now, for all function t : K 1— ► K, let vt be the inverse Fourier transform of 
t* /q*(-.), i.e. 

This operator is introduced because it verifies Ejuj (Yfc)|Xfe] = t{X}.) for all 
function t. We can write the following lemma : 

Lemma 1. If r > 7 + 2, there exists $1 > such that 

P3 ||£ 3 wE fc ¥&IU<4iA* 
P4 ||E fe K,J 2 Hoc<*i(2^)^+ 2 

P5 E k \K jk \\ 2 <^ j )^ +1 

P6 IIE fefe 'K^,vl 2 Hoo<^(2 J ') 27+3 
P7 E^/K„^vl 2 < $ i( 2J ') 27+2 
This lemma is proved in Section [BJ 



3.2. Construction of a contrast 



Now let us estimate the transition density of the Markov chain by minimizing 
a contrast. This section is devoted to the definition of this contrast. We explain 
here how it can be obtained, first by considering the case without noise. 



3.2.1. First step: if X\, . . . , X n+ i were observed 



We present here a heuristic to understand why we choose the contrast, assuming 
first that the (Xi) are known. For all function g, the definition of the transition 
density implies E^pQ+i^Xi] = jIi{Xi,y) g{y)dy so that we can write 



g(X i+1 ) = (J U(.,y)g(y)dy j {X l ) + m 

where r]i = g{X i+ i) — ¥\g(X i+ x)\Xi\ is a centered process. We recognize then a 
regression model. A contrast to estimate J H(.,y)g(y)dy is 



1 n 

ln{u) = - V[w 2 (X 2 ) - 2u{Xi)g(X i+1 )]. 

71 * * 



n 

i=i 
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It is the classical mean square contrast to estimate a regression function. But 
we want to estimate n(.,y) and not only J H(.,y)g(y)dy. 

So we observe that if/ g 2 = 1 and T(x, y) — u(x)g(y), then u(.) = J T(.,y)g(y)dy. 
So if u(.) = J T(.,y)g(y)dy estimates J U(.,y)g(y)dy, we can assume that T es- 
timates II. Since / T 2 (.,y)dy — u 2 (.), the contrast becomes 

i n r 

7n (t) = - J2 1 / t 2 & ,y) d y~ 2T ( X * » 

It is the contrast studied in lLacourl (|2007al ) and it allows for a good estimation 
of II(., y) when the Markov chain is observed. We can observe that 

E 7 „(T) = J T 2 ( Xl y)f(x)dxdy~2 J T(x,y)f(x)U(x,y)dxdy = \\T-Il\\ 2 f -\\Il\\ 2 f 

where / is the density of Xi and 

Then this contrast is an empirical counterpart of the distance ||T — II||/. 



1/2 

T 2 (x,y)f(x)dxdy 



3.2.2. Second step: the Xi 's are unknown, the observations are the Yi 's 

The aim of this step is to modify the previous contrast, to take into account 
that the X^s are not obs erved. To do t his, w e use the same technique as in the 
convolution problem (see lComte et al.l ( 20061 )). Let us denote by Fx the density 
of (Xi, X t+ i) and F Y the density of (Yi, Y i+ i). We remark that F Y — F x * (q®q) 
and F Y = F x (q* ® q* ) and then 



\T(Xi, Xi+i)] =IItF X = IJJt*F x = IU =^=F Y 



by using the Parseval equality. The idea is then to define V£ = T* /(q* <g> q*) so 
that 

E[T(Xi,X i+1 )} = i- / J V£F*= J I V T F Y =E[V T (Yi,Y i+1 )}. 

Then we replace the term T(Xi, X i+ i) in the contrast by Vr(Yi,Yi +1 ). In the 
same way, we find an operator Q to replace the term J T 2 (Xi,y)dy. More pre- 
cisely, for all function T, let Vr be the inverse Fourier transform of T* /(q* (g> 
<?*)(-.), i.e. 

V T (x,y) = -^H e«"+^ T * (U J dudv. 
4?r^ J J q*(-u)q*(-v) 

Let Qt be the inverse Fourier transform of T*(., 0)/(g*)(— .), i.e. 

Qt(x) i I 

V and Q have been chosen so that the following lemma holds. 
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Lemma 2. For all k S {1, . . . , n + 1} 

1. E[V T (Y k ,Y k+1 )\X 1 ,...,X n+1 ]=T{X k ,X k+1 ) 

2. E[V T (Y k ,Y k+1 )] = JjT(x,y)IL(x,y)f(x)dxdy 

3. E[Q T (Y k )\X l ,...,X n+1 ] = fT(X k ,y)dy 

4. E[Q T (Y k )\ = JfT(x,y)f(x)dxdy 

Points 1. and 5. are proved in Section [6) the other assertions are their imme- 
diate consequences. Note that V and Q are strongly linked with v. In particular 
V sm {x,y) = v s (x)v t (y) and Q s ®t{x) = v s (x) J t(y)dy. 

By using the operators V and Q, we now define the contrast, depending only 
on the observations Yj., . . . , 

1 n 

7n(T) = - Y\Q T *{Y k ) - 2V T (Y k , Y k+1 )} 
n — ' 
fc=i 

With Lemma H we compute E(7„(T)) = // T 2 (x,y)f{x)dxdy - 2 JfT(x,y) 
H(x, y)f(x)dxdy = \\T — IT| ? — ||n||y. So we want to estimate IT by minimizing 
7„ . The definition of the contrast leads to the following "empirical norm" : 

1 ™ 

*n(r) = -y)QT»(n). 

n * — ' 

fe=l 

The term empirical norm is used because E^ n (T) = 1 1 CZ~" 1 1 ^- , but *ff n is not a 
norm in the common sense of the word. 



3.3. Definition of the estimator 

We have to minimize the contrast 7„ to find our estimator. By writing T 

Ejlj E(fc,i)eA^ °>jkl<Pjk O = Ea w*(x, y), we obtain 



din{T) = 2 
9a Ao n 



-EE a *Q^o w - v ^ ( Y i,Y i+ i) 



i=l \ A 

Then, by denoting A m the vector of the coefficients a\ of T, 



VAn 







(5) 



where 



1 - 

— E Q^x^n (^i) 



- A, 



J A 



But the matrix G m is not necessarily invertible. This is why we introduce the 



set 



T = <J minSp(G m ) > -/ 



(6) 
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where Sp denotes the spectrum, i.e. the set of the eigenvalues of the matrix and 
fo is the lower bound of / on A\ . On T, G m is invertible and 7„ is convex so that 
the minimization of j n is equivalent to Equation ([5]) and admits the solution 
A m = GZrZm- Now we can define 



n, 



argmin T6Sm 7„(T) on T, 

on r c . 



Remark 2. The term 2/3 inT can be replaced by any constant smaller than 1. 
Moreover, the construction of II m described here requires the knowledge of fo. 
Nevertheless, when fo is unknown, we can replace it by an estimator fo defined 
as the minimum o f an estimator o f f (for an estimator of the density of a hidden 
Markov chain, see \Lacou\ X2001& )). The result is then unchanged if f is regular 
enough and the mixing rate high enough. 

We then have an estimator of II for all § TO . But we have to choose the best 
model m to obtain an estimator which achieves the best rate of convergence, 
whatever the regularity of II. So we set 

m = arg min {7 n (II m ) + pen(m)} 

where pen is a penalty function to be specified later and 

M n = {m> J,D% +2 <n}. 
Then we can define our final estimator: 



n 



tlfn ii \\tl m \\ < k n with k n = n 1/2 , 
else. 



4. Result 

4-1- Risk and rate of convergence 

For a function G and a subspace §, we define 

d A (G,S) = inf IIG-TIU. 

res 

We recall that A is the estimation area. For each estimator il m , we have the 
following decomposition of the risk: 

Proposition 1. We consider a Markov chain and a noise satisfying Assump- 
tions H1-H5 with 7 > 3/4. For m fixed in M n , we consider II m the estimator 
of the transition density II, previously described. Then there exists C > such 
that 

E||n m - < c\ 4(n, s m ) + 
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We do not prove this proposition because this result is included in Theorem 
[1] below, which is proved in Section O 

Now if II belongs to a Besov space with regularity a, it is a common approx- 
imation property of the wavelet spaces that d A (U,S m ) < CD^ a . So, choosing 
mi such that D mi = n 1 /( 2a + 4 't+ 2 ) j we obtain the minimum risk 

E\\fl mi -U\\ 2 A < Cn-^+2. 

But this choice of mi is impossible if a is unknown (it is a priori the case since 
n is unknown). That is why we have built our estimator II via model selection. 
Now we can state the following theorem. 

Theorem 1. We consider a Markov chain and a noise that satisfy Assumptions 
H1-H5 with 7 > 3/4. We consider II the estimator of the transition density II 
previously described with r > 27 + 3/2 and 

pen(m) = K — — — for some K > Kq 
n 

where K = C(7)$ 2 ||g|| 2 c / " 1 . Then th ere exists C > such that 

c 

E||n-n||^<C inf {4(II 1 5J+pen(m)} + - 

mGM n n 

with C = max(2 + 72/ - 1 ||/|| oo , Al (l + 2||n|| 2 4 ), 12^(1 + 2\\U\\ 2 A )). 

Note that this result is non-asympotic. It is an advantage of the least square 
method over the quotient method. 

All the constants on which the penalty depends do not have the same status. 
The constants $i, 7 and \\q\\oo are known, since the wavelet basis and the noise 
distribution are known. The constant / is unknown but it can be estimated 
(see Remark [5]). Then, even if it means replacing fo by an estimator /o, the 
penalty is computable. In particular the dependence coefficients of the sequence 
do not appear at all in the penalty. 

The condition 7 > 3/4 is due to an additional term of order Dm +7 ^ 2 /n 
(coming from the term (1/n) X)"=i Qt 2 (^) m the contrast) inside the penalty. 
If 7 > 3/4, then 27 + 7/2 < 47 + 2 and D% +2 /n is the dominant term. If 
7 = 3/4, the result is still true but the constant in the penalty also depends on 
|| H\\a- In the other cases the estimation is possible but the term Dm +7 ^ 2 /n is 
not negligible any more and the order of the variance (and consequently the rate 
of convergence) must be changed. This constraint 7 > 3/4 is not very restrictive 
since 7 must be larger than 1/2 in order that q be square integrable. Moreover 
in the case of a Gamma noise, q is not bounded if 7 < 1. 

We can now evaluate the rate of convergence of our estimator. 

Corollary 1. We suppose that the restriction of n to A belongs to the Besov 
space B2 00(A) with a < r. Then, under the assumptions of Theorem^ 



E||n-n||^ = 0(n~ 5s+tff3), 
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To our knowledge, the minimax rates are unknown in the specific estimation 
problem we consider here and finding them is definitely beyond the scope of 
this paper. Nevertheless. IClemenconl ( 20031 ) proved that the rate n~ 20+4-7+2 j s 



optimal whenever / and /II belong to B^ oo(R) and B% oo(R 2 ) respectively. 

Nevertheless we remark that we obtain the same rate of convergence with II 
as those obtained with II mi where D mi = n 1 ' ( 2+47+2a ) , but witho ut requiring 



the knowledge of a. Moreover our estimator is better than the one of lClemencon 



(2003), which a chieves only the rate (ln(n) /n) 20+47+2 _ \i j s a i so an improvement 



on the result of lLacour (2007b) because this rate is obtained without requiring 



any regularity for / or /II. 

If we want to compare the quotient method described in lLacour (l2007bl) and 



the one introduced in this paper, we can say that only the quotient method al- 
lows dealing with supersmooth distributions, at least from a theoretical point of 
view. However, the least squares method has the advantage of giving a good rate 
of convergence without requiring prior information on the st ationary densit y. 



Moreover, our result is non-asymptotic contrary to the one of lLacourl (|2007 



4-2. Sketch of proof of Theorem^ 

We give in this section a sketch of proof of Theorem [TJ 

Let m £ M. n - We denote by IT m the orthogonal projection of IT on S m . We 
have the following bias-variance decomposition 

E||fl - n|| 2 = E||n - n m \\\ + \\u m - ii\\ 2 A 

The term ||IT — II m ||^ can be written in the following way 

||n-n ro || 2 = ||n-iMA%u<M + ll fi -^lft%U>M 

< ||n ??l — iimlU + !in™llii{||ri, 5l ||>fc„} 

since II = on the set {||n^,|| > k n } and IT = Ii m on the complement. The term 
llnroll^lrnjj . n >k j is easily dealt with, the main term is \\ti m — ^m\W- But, on 
r, the definitions of IT m and rh lead to the inequality 

InQTn) + pen(m) < 7„(IT m ) + pen(m). (7) 

Letting Z n , m (T) = \ Y,k=il v T( Y k,Yk+i)-QTn m (Y k )}, a fast computation gives 

7 n (n rfi ) - 7„(n m ) = $> n (n.m - n m ) - 2z n . m (% h - n m ) 

so that ((7| becomes 

* n (IT A -IT m ) < 2Z n , m (n A , - II m ) + pen(m) - pen(m) 

< 2||IT rfl - IT m ||/ sup Z n , m (T) + pen(m) - pen(m) 

T£Bf(m,m) 

where Bf(m,rh) = {T G S m + S, ni \\T\\f = 1}. The main steps of the proof 
arc then 
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1. to control the term supygs t{m,m) Z n ,m{T), 

2. to link the empirical "norm" $„ with the L 2 norm 

• To deal with the supremum of the empirical process Z nim (T), we use an 
inequality of Talagrand stated in Lemma [51 (Section 16 . 8[) . This inequality is very 
powerful but can be applied only to a sum of independent random variables. 
That is why we split Z n _ m (T) into three processes plus a bias term. 

Z n , m (T) = Z£\T) ~ Z™(T) + Zi 3 \T) + JJ T(x,y)(n - U m )(x,y)f(x)dxdy 
with 

. n 

Z^KT) = - V V T (Y k ,Y k+1 ) - E[V T (Y k , Y k+1 )\X u X n+1 ] 
n * — ' 

fc=i 

1 ™ 

< (T) = - V Q T u m (Y k ) - E[Q Tnm (Y k )} 
n * — ' 

k=l 
1 ™ 

Z n 3) (T) = -Y / T(X k ,X k+1 ) - E[T(X k ,X k+1 )] 

V. k=l 

For the first process Zn > we are back to independent variables by remarking 
that, conditionally to X\, . . . , X n +i, the couples (l2i-i,^2i) are independent 
(see Proposition [3|) . 

For the other processes, we use the mixing assumption H5 to build auxil- 
iary variables X* which are approximations of the X^s and which constitute 
independent clusters of variables (see Proposition |4]) . 

• To pass from ^ n to the L 2 norm, we introduce the following set 

A = {VTeS \\T\\ 2 f < ~f n (T)} 

We can easily prove (see Section |673|) that A C T. Then, 

3 

||n A - n m |Ui A < -fo^niiifn - n m )i r 

It remains to prove that P(A C ) = P(3T E S, *„(T) < (2/3)E[* Tl (T)]) is small 
enough. It is done in Proposition [2l 

5. Simulations 

To illustrate the method, we compute our estimator II for different Markov pro- 
cesses with known transition density. The estimation procedure contains several 
Fourier transforms. This may seem heavy, but, for each noise distribution, the 
computation of v Vjk for all the basis functions can be done beforehand. Here we 
use the Daubechies wavelet D20. Next, to compute II from data Y\, . . . , Y n+ i, 
we use the following steps (see Section [3~3)l : 
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• For each m, compute matrices G m and Z m , 

• Deduce the matrix A m , 

• Select the rh which minimizes 7 n (n m ) + pen(m) = — A m Z m + pen(m), 

• Compute II using matrix A m . 

Actually, following the theoretical procedure, we should set IT m = on F c 
(see Section f373|) but, for practical purposes, it is more sensible to inverse G m 
whenever possible. In all the examples examined below, the minimum of the 
spectrum of G m has never been too small (so that we merely inverted it without 
using set T). The reason is that P(r c ) is very small: it appears in the proofs 
that it can be bounded with an exponential inequality 

We consider several kinds of Markov chains : 

• An autoregressive process denoted by AR and defined by: 

A„+i = aX n + b + e n +\ 

where the e n +i are independent and identical distributed random vari- 
ables, with centered Gaussian distribution with variance a 2 . For this pro- 
cess, the transition density can be written l/[a-\Z2n) exp(— (y — ax — 
b) 2 /2a 2 ). We consider the following parameter values : 

(i) a = 2/3, b = 0, a 2 = 5/9, estimated on [-2, 2] 2 . 

(ii) a = 0.5, b = 3, a 2 = 1, and then the process is estimated on [4,8] 2 . 

• A radial Ornstein-Uhlenbeck process (in its discrete version). For j = 
1, . . . , 5, we define the processes: = a££ + j3s J n where the e 3 n are i.i.d. 

standard Gaussian. The chain is then defined by X n = \/ J2i=i(£n) 2 ■ The 



transition density is given in IChalevat-Maurel and Genon-CatalotJ ( 20061 ) 
where this process is studied in detail: 

/ y 2 + a 2 x 2 \ ( axy\ ax ( y \<V 2 

n(x, y) = t y>0 exp ^ ^_ j h „_ x j _ j 

and Is/2-1 is the Bessel function with index 8/2 — 1. This process (with 
here a = 0.5,/3 = 3,(5 = 3)is denoted by %/ CIR since its square is actually 
a Cox-Ingersoll-Ross process. The estimation domain for this process is 
[2,10] 2 . 

A Cox-Ingersoll-Ross process, which is exactly the square of the previous 
process. The invariant distribution is a Gamma density with scale param- 
eter I = (1 — a 2 )/(20 2 ) and shape parameter a = 8/2. The transition 
density is 

1 / y + a 2 x\ fa^xy\ / y \*/4-i/2 

< X >y> = ^2 ex P T^r— 4/2-1 -^T~ 72Z 



The used parameters are the following: 



(hi) a = 3/4, (3 = ^7/48 and 5 = 4, estimated on [0.1, 3] 5 
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(iv) a =1/3,0 = 3/4 and 6 = 2. This chain is estimated on [0, 2] 2 . 

• An ARCH process defined by X n+ \ — sin(X„) + (cos(A„) + 3)e n +i where 
the e n +i are i.i.d. standard Gaussian. The transition density of this chain 
is 

f y- sm{x) \ 1 
H ,V> ~ tp \cos(x) + 3j 008(20 + 3 

and we estimate this process on [— 5, 5] 2 . 

For this last chain, the stationary density is not explicit. So we simulate 
n + 500 variables and we estimate only from the last n to ensure the stationarity 
of the process. For the other chains, it is sufficient to simulate an initial variable 
X with density /. 



n 


50 


100 


250 


500 


1000 


noise 


AR(i) 


0.579 
0.599 


0.407 
0.480 


0.270 
0.313 


0.230 
0.272 


0.209 
0.245 


Lapl 

Gauss 


AR(ii) 


0.389 
0.339 


0.294 
0.304 


0.195 
0.280 


0.155 
0.273 


0.139 
0.271 


Lapl 
Gauss 


VCIR 


0.171 
0.199 


0.138 
0.169 


0.123 
0.150 


0.118 
0.142 


0.111 
0.139 


Lapl 
Gauss 


CIR(iii) 


0.420 
0.337 


0.345 
0.302 


0.237 
0.276 


0.195 
0.245 


0.175 
0.209 


Lapl 
Gauss 


CIR(iv) 


0.525 
0.369 


0.403 
0.345 


0.337 
0.344 


0.304 
0.327 


0.292 
0.321 


Lapl 
Gauss 


ARCH 


0.312 
0.337 


0.287 
0.319 


0.261 
0.296 


0.185 
0.290 


0.150 
0.183 


Lapl 
Gauss 



Table 1 

MISE E||IT - Il|| 2 averaged over N = 200 samples. 



We consider two different noises: 
Laplace noise In this case, the density of e% is given by 

«(*) = £e-*M; g *(a0 = — A = 5. 
The smoothness parameter is 7 = 2 so that the theoretical penalty is 

pen(m) =C^\\\ q f^^f- = \ ' C&J^D™ 

Several simulations lead to fix a constant C very low. As the term f^ 1 
does not vary very much with regard to C, we choose to use the same 
following penalty for all the examples: 

1 /A\ 2 /^ xl ° 



pen(m) = - - — 
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900 1000 



Fig 1 . Mean of the MISE for the six processes when n increases 



Gaussian noise In this case, the density of e$ is given by 

1 



q(x) 



AV27T 



e 



q*(x) 



A = 0.3. 



This noise does not verify Assumption HI but it is interesting to see if this 
assumption is also necessary for practical purposes. Given the exponential 
regularity of this noise, we consider the following penalty 

pen(m) = - exp(A 2 L>;L) 
n 

where, by simulation experiments, we calibrate the penalty with k — 5. 

Table Q] presents the L 2 risk of our estimator of the transition density for 
the six Markov c hains a nd the two noises. These results can be compared with 
those of Lacourl ( 2007a[ ) (Table 2) who studies the processes AR(i), VCIR and 
ARCH but directly observed, i.e. without noise. The risk values are then higher 
in our case, but with the same order, which is satisfactory. It is noticeable that 
the estimation works almost in the same way with the Gaussian noise, but with 
a slower decrease of the risk, as can be observed in Figure Q] . It is a classical 
phenomenon in deconvolution problems, since the Gaussian noise is much more 
regular than the Laplace noise. 

Figure [2] allows visualizing the result for process ARCH observed through a 
Laplace noise: the surfaces z = H(x,y) and z — n(ar, y) are presented. We also 
give figures of cross-sections of this kind of surfaces. We can see in Figure [3] the 
curves z = H(x, — 0.44) versus z = fl(x, — 0.44) and the curves z = 11(1.12, y) 
versus z = 11(1.12, y) for the process AR(i). Generally, for a multidimensional 
estimation, the mixed control of the directions does not enable to do as well as a 
classical one-dimensional function estimation. Nevertheless here the curves are 
very close. 

From a practical point of vue, it is difficult to compare the method described 
here and the one of Lacour (|2007bh . Indeed, the bases used are very different. 
However, we can say that the quotient method seems to give better results when 
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y= -0.44 x=1.12 



Fig 3. Sections for process AR(i) observed through a Laplace noise, n = 500 

the noise distribution is Gaussian (that is conform to theory). Nevertheless, the 
least squares procedure is better for a Laplace noise, especially when n is small. 

6. Detailed proofs 

6.1. Proof of Lemma [7] 

• Using 

\^U X )\ ^ CX^HMIco < CW, (8) 
P3 holds if $! > 2C'(ip). 
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• The computation of v Vjk gives 

lv>)l < ^ / , 'f ( 1U 
I Pjfc*. )\ - 2jt J l g *(_ w2 J)l 

Next, it follows from Assumption HI that |w ViS .(£)| < Ci i7 (2 : ') 7+1 / 2 /27rfco us- 
ing Lemma [5] (Section l6.8|) since r > 7 + 1. Then, for all x, J2k \ v Vjk ( x )\ 2 — 

3.2 3 C 1 2 7 ^(2J) 2 ''+ 1 that establishes P4 with > 3Cl y k~ 2 /(in 2 ). 

• To prove P5, we apply the Parseval equality. That yields 



2 



1 f \<p*(v)\ 



"Vjk I 



2 



2tt J \q*(-v2i)\- 



-dv. 



Using HI and given that 2r > 2-f + 1, we obtain / \v Vjh \ 2 < C 2 ,i 1 {2 j ) 21 /2-Kkl 
And finally P5 holds with $1 > 3C 2 ,2-yk~ 2 /(2tt). 

• We begin with computing \v VjkV . k , [x)\ by using that (ifjk^Pjk')* is equal to 
the convolution product ip* k * (p* k , . 

\v VjkVjhf {x)\ < 2jT J J \q*(-u)\ 

<hl V (Vr ff \<p*(y)<p*(x-y)\(x 2 + iy/ 2 dxdy. 



2vr 

Then Lemma [5] (Section I6.8|) shows that 

kr 1 



K ihVjh ,{x)\ < ^-(2 j y +1 c r / \x\ 1 ~ r {x 2 + iy/ 2 dx 



27r L J|x|>l 

(x 2 + iy ,2 dx 

\x\<\ 

Hence, since r > 7 + 2, there exists C > such that (v^w O*")! < (7(2- 7 )">' +1 . 
The fact that tpj^ and ^-fe/ have disjoint supports if k + iv < k' — N + 1 or 
fc' + iV<£;-iV+l enables to prove P6 with $1 > 3(4^ - 3)C 2 . 
• Applying Parseval's equality, 

2P_ f JfeMlW, 

- | g *(-2^)i 2 

But, using Lemma 

|(^^- fc 0*(2 J «)| < y - y)\dy < C r [Iv^l^ + l w <i] (9) 

Then, it follows that 
K 2 2> 



2n 



C 2 / (\v\ 2 l 1 ^h H>1 +l\ v \< 1 )((2*v) 2 +iydv < C(2^ +1 . 



It is then sufficient to sum this quantity for all k, k' by taking into account the 
superposition of the supports to prove P7 as soon as $1 > 3C(AN — 3). 



C. Lacour / 'Estimation of the transition of a hidden Markov chain 19 

6.2. Proof of Lemma\E 

1. First we write that 

V T (Y k ,Y k+1 ) = -L f e flt»*W J*(^ dudv 
An 2 J q*(-u)q*{-v) 

so that, by denoting X = (Xi, . . . , X n+ i), 

E[V T (Y k ,Y k+1 )\X] = / E[e^"+^^"|X] J*^^) . dudv. 

An 2 J q*{-u)q*(-v) 

By using the independence between (Xj) and (e^), we compute 

_ e iX fc u+iX fc+1 o E |- e i efc uj ]E [ e fe h+1 «] = e iX fc u-MX* +l « g *^_ 1i ^*^_^ > 

Then 

3. We proceed in a similar way for Q. Since Q T (Y k ) = {1/2%) J e lYkU T*(u, 0) 
then 



E[Q T (y fc )|x] = ^Jn 



By using the independence between (X) and (e^), we compute 

E[e lYkU \X] = E[e lXkU e l£kU \X\ = e lXfc "E[e i£fcU ] = e lXfc V (-u). 

Thus 

E[Q T (Y k )\X] = ±- [ e iXkU q*(-u) T y'% u=±- f e iX * u T*(u,0)du. 
2tt J q*(-u) 2tt J 

By denoting by T y the function x i— > T y {x) — T(x, y), we obtain 
T*(u,0) = ^ e- lxu T y (x)dxdy = J T* y {u)dy 

and then 

_L /" e «*«T*( U ,0)d«=^ /Y e lXkU T;(u)dydu 



f f (10) 

/ T y (X k )dy = / T(X k ,y)dy. 
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6.3. Proof of Theorem^ 

We start with introducing some auxiliary variables whose existence is ensured 
by Assumption H5 of mixing. In the case of arithmetical mixing, since 6 > 8, 
there exists a real c such that < c < 3/8 and c9 > 3. In this case, we set 
Qn = l nC \- I n the case of geometric mixing, we set q n = |_cln(n)J where c is a 
real larger than 3/6. 

For the sake of simplicity, we suppose that n + 1 = 2p n q n , with p n an 
integer. Let for I = 0, . . . ,p n - I, Ai = jj£ajg n +i, — » X (2l+i)q n ) and i?; = 
( x {2i+i)g n +i, X {2 i + 2) q J- As in IViennetl (|l997n . by using Berbee's coupling 
Lemma, we can build a sequence (A*) such that 

A; and A* have the same distribution, 
A* and A*, are independent if I ^ (11) 

A*) </?,„. 

In the same way, we build (-B*) and we define for any / £ {0, . . . ,p n — 1}, 

A* = P^2i 9 „ + l> — i ^(*2i+l)g„)' ^* — ( J ^(*2; + l) 9 „ + l' ^(*2i+2) ? „) S0 tnat tne Se ~ 

quence (X*, . . . , A*) is well defined. We can now define 

n* x = {Vi, 1 < i < n + 1 A, = A*}. 

9 1 

Let us recall that S is the space S m with maximal dimension V < n 4 ~t+ 2 . 
We now adopt the notations 

3 
2 

Let us fix m € M n . We denote by II m the orthogonal projection of IT on § m . 
Then we have the decomposition 



A = {VTeS \\T\\j < -*„(T)}; Q = An^. 



|n — n|li < 2E(||n-n m |||ini|| ftA ||< fe J +2E(||n-n m ||iini|| n 



+2E ( ||n - n m \\ 2 A t n . ) + 2||n m - n 112 



A 



< 2E(||n 7fl -n m ||^i n ) +2||n m ||iE(i n i||n ri i| >fci 



2 

A ■ 



+2E ([2||fi|ft + 2||n m |ft]i n .) + 2||n m - n| 

Now, using the Markov inequality and the definition of II, 

:i|n™|| 2 io) 



E||n - n||i < 2E (j|n A - u m \\ A t n ) + 2||n||l 

+4(fc 2 + ||n|| 2 4 )E(io c ) + 2||n m -n|| 2 4 . 

But E(||n A || 2 l n ) < 2E(||LL fl - n m \\ A t n ) + 2\\n m \\ 2 A and k n = yfti, so 

EUn-nni < 2E(||n A -n m || 2 1 i f2 N ) (i + 2\\u\\ 2 A ) + ffl^ 



n 

-4(n + ||n|| 2 4 )P(^) + 2||n m -n|| 2 . 
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We now state the following proposition : 
Proposition 2. There exists Co > such that 

Hence 

E||n - n|ft < 2||n m - n|ft + 2E (yn* - iimllW) (1 + 2||n|ft) 

4 (12) 

+ _(||n||i + c7o(i + ||n||i)). 

Now we have to bound E (j|ri r h — II TO ||^ln^ . The estimators fl rn are dehned 
by minimization of the contrast on a set T defined in ([5]). Let us prove that this 
set T contains f2. More precisely, we prove that AcT. For T — J2\ °a^a G §>m, 
the matrix A m — (a\) of its coefficients in the basis (oj\(x, y)) verifies 5 , „(T) = 
t A m G m A m . Then, on A, 

t A m G m A m > ^\\Tf f > ^f \\T\\ 2 . 

Now, using P2, ||T|| 2 = t A m A m and then t A m G m A m > (2/3)f t A m A m . If /z is 
an eigenvalue of G m , there exists A m ^ such that G m A m = fj,A m and then 
H A m A m . Then, on A, 

t 2 t 

M A m A m > — /q A m A m . 

Consequently /i > (2/3)/o- So A C T and minimizes the contrast on A. 
We now observe that, for all functions T, S 

2 ™ 

7n(T) - 7 „(5) = * n (T - S) - - J2l V (T-s){Yk, Y k+1 ) - Q ( t-s)s(^)]- 

fe=i 

Then, since on A, 7„(f[ rft ) + pen(m) < 7„(n m ) + pen(m), 
2 " 

*„(n A -n m ) < -E[ y (n^-n m )(^' y fe+i)-Q(n^-n m) n m (^)] 

11 k=l 

+pen(m) — pen(m) 

< 2Z n ^ rn (tl m - n m ) + pen(m) - pen(m) 

< 2||n jfl - n m ||/ sup Z n , m (T) + pen(m) - pen(ra) 

T<EBf(m,m) 

where 

1 - 

Z n ,m{T) = - Y\V T {Y kl Y k+1 ) - Q T n m (Yk)} 
n * — ' 

fe=i 
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and, for all mf, B f (m,m') = {Te S m + S m >, \\T\\ f = 1}. Now let p(., .) be a 
function such that for all m,m', 12p(m,m') < pen(m) +pen(m'). Then 

*„(n A -n m ) < hil m -Il m \\ 2 f + 3[ sup Z 2 ;ro (T)-4p(m,m)]+2pen(m). 

TeB f (m,m) 

So, using the definition of A D 0, 

lift* - n m || 2 i f2 < |* n (n A - n m )i n 

< J||n A -n m ||^l n + | J! t sup < m (r)-4p(m,m / )]ln + 3pen(m) 



cj || ' / 1 ~/ft || j —at ' r) 

2 ^ m'*iXl n r€B / (m,m') 



Thus 



htl m - U m \\}l n < I [ sup ^, m (T) - 4p(m,m')]lo + 3pcn(m) 



Q || "HI J a£ — C\ 

1 Z m'eM n TeB f (m,m>) 

And using Assumption H3, 



||n™-n ro ||2 In < 9/q" 1 V [ sup Z2 im (T)-4p(m,m / )]ln+6/ - 1 pen(m) 

(13) 

Now, by denoting Ex the expectation conditionally to X\, . . . , X n+ \, the 
process Z„ iTO (T) can be split in the following way : 

Z n , m (T) = ZW(T) - Z< 2 >(T) + Z^(T) + JJ T(x,y)(U-U m )(x, y )f(x)dxdy 



with 



-. n 

Z n 1] {T) = - V V T (Y k , Y k+1 ) - E x [V T (Y k , Y k+1 )} 
1 ™ 

Zi 2) (T) = - E ^ (Y k ) E[Q TIIm (Y k )} 

k=l 

1 ™ 

4 3) C0 = -^(iaHi)-E[%,%)] 

Then, by introducing functions P\(., ■), p%{-, ■) and p 3 (., .) 

sup Zl m (T)-Ap(m,m') <A sup (Z«(T) 2 - Pl (m,m')) 

TeB f (m,m') TeB f (m,m') 

+4 sup (4 2 )(T) 2 - P2 (m,m'))+4 sup (Z^ (T) 2 - p 3 (m, m')) 

TeB f (m,m') TeB f (m,m>) 

+4{{pi+P2+P3)(m,m')-p{m,m'))+4: sup ||(n - n ro )l A ||^||T||^ 

TeB f (m,m') 

We now use the following propositions. 



C. Lacour / Estimation of the transition of a hidden Markov chain 



23 



Proposition 3. Let pi(m,m') = Ki('j)^ifo 1 \\q\\ 2 X) D^,t' 2 /n w here m" — 
max(m,m'). Then, if r > 2"/ + 1/2, there exists a positive constant C\ such 
that 

sup Z«(T) 2 - Pl (m,m') 1 ( ' 

TeB f (m,m') 



E E 



m'eMr. 



< 



Proposition 4. Let p 2 (m,m') — p%~' (m, m') + p 2 z> (m, m') with p^' '(m, m' 



(2), 



K 2 \\TL\\\ D%+ 7/2 /n andp?>(m,m') = K 2 \\n\\ 2 A (J2 k f3 k )D? n „ /n where m" = 
max(m, m'). Then, if r > 27 + 3/2, there exists a positive constant C 2 such that 

C-2 



,(2) 



E E 



m'eMr. 



sup Z^(T) 2 -p 2 (m,m / ) 
TeBf(m,m') 



In < 



Proposition 5. Let p 3 (m,m') — K 3 J2k &^m" / n w ^ ere m " = max(m, m'). 
Then, there exists a positive constant C 3 such that 



E *( 

»'e.M„ \ 



sup Z^(T) 2 -p 3 (m,m') 
TeBf(m,m') 



-1 + 



\ C 3 
In < — • 

n 



The first two pr opositio n s are p roved in Sections l6.6l and f6.7l The last propo- 
sition is proved in iLacour ( 2007b ) Section 6.5 (for another basis but only the 
property P3 || J2jk "PffclU < $iAn is used). 

Then we get 



E 



m'eMr. 



SU P Z n,m ( T ) - 4 P( m : m ') 
TeB f (m,m') 



In < 4 



C\ + C 2 + C3 



+4||(n-II m )i i i||/+4 ((Pi+P2+P3)(m,m')-p(m,m')). 

m'GMn 

But, if 7 > 3/4, 47 + 2 > 27 + 7/2 and there exists m 2 such that for all m' > m 2 , 
Pi{m,m!) > p 2 {m,m!) + p 3 (m,m!). It implies that 

(pi(m, m') + p 2 (m, m!) + P3(m, m!) — 2p\(m, to')) 

m'GMn 

C(m 2 ) 



< 



{p 2 (m,m') + p 3 (m,m') —pi(m,m')) 



< 



Thus in the case 7 > 3/4, we choose p = 2pi and 



E E 



SU P m( T ) " 4 P( TO ) 

TeB f (m') 



In < 4 



Ci + C 2 + g 3 + C(m 2 ) 
n 

+4||/|| 00lAl ||n-n m ||^ 

(14) 
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If 7 = 3/4, we choose p — 2{p\ + p^). Since there exists to 2 such that for all 
ml > m 2l Pi (to, to') + p^ (to, to') > p 2 (to, m!) + P3(to, to'), we can write 



(pi(to, to') + p2(m, to') + p^{m, to') — p(m, to')) 
< (p| 2 ' (to, to') + ps(m, to') — pi(m, to') — p^ 1 ' ( m i to')) < — - — — 



m'<m2 



and (HU) holds. 

Finally, combining lfl2]). (fT3j) and dT4j) , we obtain 



E||n - n||* < 2||n m - n||* + -(||n|| 4 4 + c (i + ||n||* )) 

n 



C 1+ C 2 + C 3 + C(m 2 ) , , |f|| ||TT „ ||2 



2(1 + 2||n||l)9/ - i 4^ 2 ^ + 411/iu^nn - n 



+2(l + 2||n||^)6/ - 1 pen(m). 
Then, by letting C = max(2 + 72/ - 1 ||/|| co , Al (1 + 2||n|ft), I2f^{l + 2||n|ft )), 

c 

E||n-n|||<c mf (||n m -n||^ + P cn(TO)) + — 

We still have to verify that 12p(m, to') < pen(m) +pen(m'). But, if 7 > 3/4, 

I2p(m,m') = 24#i = 24if 1 dim(5m + Sm '^ +2 < pcn ( m ) + pe n(m') 
n n 

with pen(m) > 24K 1 D^ +2 /n. And if 7 = 3/4, 

n5 

12 P (to,to') = 24(K 1 +K 2 \\II\\a)— !I!1 < pen(m) + pen(m') 

n 

with pcn(TO) > 24(iTi + K 2 \\IL\\ 2 A )D%+ 2 /n. 
6-4- Proof of Corollary\l] 

It follows from lMeverl ( 199Clh Chapter 6, Section 10 that II belongs to B^^ii and 

only if sup j y J 2 2: > a (J2 k x \a jk i\ 2 ) 1/2 < 00 with aju = J U(x,y)ip jk (x)ipji{y)dxdy. 
Then 

4(n, s m ) = E E M 2 ^ c E 2 ~ 4jQ ^ c '^» 2a 

j>m k,l j>m 

Since d|(n,5 OT ) = 0{D m 2a ), Theorem Q] becomes 

D 4 7 +2 

E||H - H||i < C" inf {D~ 2 <* + -S— }. 

meMn n 

with C" a positive constant. By setting D mi the integer part of n 1 /^ 1+2a+2 \ 
then 

E||n - n||i < c"{D m 2a + ^1^} = o( n -3^+2). 
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6. 5. Proof of Proposition [2| 

Wc first remark that P(Q C ) < P(Qx) + P ( AC n ^*x)- In thc geometric case 
Ar„ < e~ ecln (") < n-° c and in thc other case 9n < {q n )-° < n- 0c . Then 

P(Q£) < 2 Pn f3 qn < n x ~ ce . 

But, c6 > 3 and so P{Q*x) < n~ 2 . We still have to bound P(A c nQ^). To do this, 
we observe that if w € A c , then there exists T in S such that \\T\\ 2 f > (3/2)*„(T) 
and then \\T\\ 2 f > (3/2)E x *„(T). But E x * n (T) = 1 £™ =1 / T 2 (X k ,y)dy. So 

p(A c n n* x ) < p(a ,c n n* x ) with 

A' = {VTeS l|TH/<^E / T 2 (X k ,y)dy}. 

k=i J 

Let us remark that (1/n) ^Li / T2 ( x k,y)dy - \\T\\ 2 = v n (T 2 ) with 

^( T ) = ^E / TO.wj-Efnxi,!,))]^. 

z— 1 

Hence 

P(A' C n O^) < P(sup K(T 2 )|l n * > 1/3) 
Tee A 

withB = {Te<S 1^11/ = !}. 

A function T in S can be written T(x, y) = X)j=j Efcz a jki<Pjk(x) l Pji (y) where 
mo is such that S = § mo . Then 

v n (T 2 )±n* x = E a ]kia 3 k'ii' n (ip : j k ipj k i) 

jkk' I 

where 

1 " 

p n (u) = -E)[«TO-E(«ra)]- as) 

n * — ' 

Let 6 jfe = (Ei a jfci) 1/2 > the n K(r 2 )|lUr Y < J2jkk' bjkbjk>\v n {<Pjk<Pjk>)\ and, if 
^6, H 3k b%^T, 3kl a 2 kl ^\\T\\ 2 <f^ 
Thus, 

sup |i/ n (T 2 )|la« < sup ^fe^fe' 1^ (^fcV^jfc' ) | - 
Te6 J2 b %=^kk> 

For the sake of simplicity, we denote A = (j, k) and A' = (j, k') so that 
sup \v n {T 2 )\t n * < f^ 1 sup ^ b\b\>\D n (ip\ip\,)\. 
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Lemma 3. Let B x ,v = H^a^A'IU and V \,\' = II^A^A'lb- Let, for any sym- 
metric matrix (A\^\>) 

p(A) = sup |a A a A '|^A.A' 
and L(tp) — max{p 2 (V), p(B)}. Then there exists $o > such that L(ip) < 



This lemma is proved in iBaraud et al. (2001) for an orthonormal basis veri- 
fying || J2x ^aI1°o — ^0^5 that is ensured by property P3. 



Now let x = 



fo 



24\\f\U Al L&) 



and 



D= VAVA' \v n ( Vx ip. x ,)\< 



On D: 



Bx,X'X + Vx,x> y^H/Hoo^x 



sup K(T 2 )|1 . < f^ 1 sup Y^bxby B x .x>x + Vx.x>j2\\f\\ 00 , Al x 



<fo 



< 



p(B)x + p(V)^2\\f\\ ooAl x 



fo P{B) + 1 (?{V) 



24||/]] 0O , Al i(^) 



1/2 



1 1 1 

" 24 + 2V3 < 3" 



Then P sup \v n (T 2 )\t n * > 1/3 < P(D C ). But z? n (u) = P„,i(u)/2+P n , 2 (u)/2 

\T£B X 



with 



Pn-l 



«n,«(«) = V" Y l,s(u) S = 1,2 



i=0 



with < 



1 



yn 



=2(2/+l)<j„ + ll 



To bound P{v nA (ipxV x') > B x .x'X+V x .x' \/2\\.f\\^ . A ,x), we will use the Bern- 
stein inequality given in iBirge and Massartl (|l998h . A fast computation gives 
E\Y ltl {<p x <p x ,)\* < ^-HBx.x'T-HVWfWocA^x') 2 . And then 



P(\v n ,.('P\<P\>)\ > Bx,x-x + Vx,x^2\\f\\^ Al x) < 2e- p - x . 

Let C = /o [48||/||oo,Ai] -1 , so that x = 2C/L(<p). Given that P(A C n Q* x ) < 
P(D C ) < Ea.A' p (l*nWv)| > BA,vai + ^A.A' VWfh^), 

2 Pn c] 



P{A c nn* x ) < 4P 2 exp 



Hip) 



< 4n 1 /( 2 ^+ 1 ) exp ^ -C 



q n L(ip) 
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But L(ip) < $ P 2 < $ « 1/(27+1) and q n < n 1 ' 2 so 

P(A C n nt) < 4n 1 /( 27+1 ) cxp (--nW^J 1 < ^ 

I $o J n 2 

because 7 > 1/2. 
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6. 6. Proof of Proposition [3| 



First we need to isolate even terms from odd terms in Zn\T) to avoid overlaps: 
Z n 1} (T) = i(4 1,1} (T) + 4 1,2) (T)) with 



1 - 

= - E V T (Y u Y l+1 ) -E x [V T (Y t ,Y t+1 )] 

i— 1,2 odd 

1 " 



i—lA even 



It is sufficient to deal with the first term only, as the second one is similar. For 
each i, let Ui = (iai-i,!^), then 

n/2 

Z " M)(T) = ^72 S < WO - E ^[^(c/,)]} ■ 

Notice that conditionally to X^, .... Jf n , the L^'s are independent. Thus we can 
use the Talagrand inequality recalled in Lemma [S] to bound 



E 



sup Z£> 1 \T) 2 - Pl (m,rri) 

TeB f (m,m') 



Wc first remark that Property PI entails Bf{m, m') C § m " with m" = max(rn, m') 
Then, if T belongs to Bf(m,m'), 



T(x,y) = ^^tt ]H ^(i)^(!;) 

j=J kl 

w^h E^wHmi^/o" 1 - 

• Let us bound ||Vr||oo for T in Bf(m, to'). HT(x,y) = J2jki a jkifjk(x)(pji(y), 
I V T (x, y) 1 2 < E OjH E I V <Piu9 Vi x fry) | 2 . 

jfci jkl 

Then, since F slglt (a;,y) = v s (x)v t (y), 

sup |Vt(x,?/)| 2 < /(T'El^^^)^!^)! 2 - 

TeBf(m,m') jkl 
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But, according to Property P4, || £ fc K Jfc | 2 ||oo < $i(2 j ) 2 t+ 2 . So, using Lemma[H 

m" 9 4 7 +4 

sup < U^j £(2^+ 4 < fo^li^+i—rD^ 4 

TeB f { m ,m>) 

and Mi = / _1/2 *i ^2 4 ^+ 4 /(2 4 ^+ 4 - l)!^ 2 . 
• To compute -ff 2 we write 

E X [ sup Zf'^lfl^/o^Ex^fei^j,) 2 ] 



TeBf(r 



jkl 



< E Var * ^ E ( F »K-< ps+i) 

jjfei \ i— 1,2 odd / 

^ ^ E ^ Var * K* ( yi K< (%)) 

<^EMK 3 ^lW^)l 2 ] (16) 

Here Varx denotes the variance conditionally to Xi, ... ,X n +±. Now, for any 
function G, the following relation holds 

Ex[|G| 2 (y x , Y 2 )] = E X [\G\ 2 (X 1 + e x ,X 2 + e 2 )] 
|G| 2 (Xi + zi,X 2 + z 2 )q(zi)q(z 2 )dzidz 2 

\G\ 2 (u 1 ,u 2 )q(u 1 -X 1 )q(u 2 -X 2 )du l du 2 < IM^HGU* 

Now, coming back to (fT6ll , 



sup 4 W W 

T£B f (m,m') 



<^ll9ll»EK^^II 2 
<^Mkj2 (ek*ii 2 1 < * ;/ °~ 1||g|l - f>y^ 2 , 

using P5. Then, according to Lemma H H 2 = <5> 2 l fQ 1 \\q\\ 2 oc 2^+ 2 /{2 i ~< +2 

n 4 7 +2 

1)^L_. 

n 

• There remains to find u. First 

VarxOW^+i)) <E x |M^+i)| 2 < IkllLll^TH 2 
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We now observe that ||^r|| 2 = ||V^|| 2 /(47r 2 ) and then 



< 



< 



1 

1 

4^2 



T*(u,v) 



q*(—u)q*(—v) 



dudv 



^^ W//^ (u ' w)|adu ^ 



\ 



4f 3klZ 7lJJ q*(-u)q*(-v 4 V 11 1 



For T e B f (m,m'), 



\\Vt\\ 2 < 



fo 



-1/2 



2tt 



\ 



j ki 



k*(-«)l 4 7 |«r(-u)| 



But {ip jk )*{u) = 2-^ 2 e luk / 2J ip*(u/2i) and then 



< 



|g*(-u)|4 

lv»l 2 



2"V(«/2 , *)| : 



du 



\q*(-vV)\* 
Since r > 27 + 1/2, Lemma [5] gives 



|«*(-U)|4 

du<fe^ 4 (2 J ') 4 T / \tp*{v)\ 2 {v 2 + l)^dv 



E 



I^WI 2 _ du /• l^,(«)| 



l<z*(-«)l 4 7 |g*(-«)| 



r du < 3.2 2 JC 2 2 47 fc " 8 (2^') 



87 



Then, using Lemma [4] with p = 87 + 2, 

02,47/0 4 



2?r 



^3(2^)87+2 < 
\ j=J 



2tt 



287^ 



2') 



D 



47+1 



and « = \\ q \\lC 2A ,fc 1 k^V^^D%t 1 /(2nV2*-«+ 2 1). 
We can now apply inequality (fl"9|) 



E[ sup \Z^\T)\ 2 -&H 2 ] + <C 

T£B f {m,m') 

( D il+1 

< C' m " e~ klD " 



-e + — ie 2jVf i 



47+4 



-k' 2 */n/D n 



Yet there exists a positive constant if such that 

E x^V*!^' < jr. 



C. Lacour / 'Estimation of the transition of a hidden Markov chain 30 

Moreover, since D m „ < , D ^ e -k' 2 ^i/D m „ /n < n i/{2 1+ i) e -kW^+» 

so that 

E D^e- k '^ D -" /n 2 < K'/n. 
Then, setting K t = 6$2/- 1 || g ||^2 4 T+ 2 /(2 4 ^+ 2 - 1), 

n 4 7+2 r n 

and the proposition is proved. 

6. 7. Proof of Proposition [^] 

Since IT m belongs to S m , it can be written 

n m (a;,y)= ^ E b rk'i"Pj'k'{x)(pfi'{y) 
j'=j (k',i')eA jr 

with Y^fk'l'bfk'l' = l! n m|| 2 < ||n||^. From the embedding Bf(m,m') C S m » 
(where m" = max(m, we have, if T belongs to Bf(m, m'), 

m 

= E E a okmk{x)Vji{y) 

withE^ ^« = imi 3 </ - 1 . 

We use the Talagrand inequality (fT9|) in Lemma [6] But the variables Yi are 
not independent. We shall use the following approximation variables 

Vl<i<n+1 Y*=X*+ei. 

These variables have the same properties as regards the l^'s as the X*'s as 
regards the X,'s (see (|TTj) ). More precisely, let, for I = 0,...,p„ — 1, C; = 

(*2Zg„ + l, ■•■,^ / (2/+l)g„), A = (^(2; + l) 9 „+l, ■ • • , ^(2/+2) 9 „ ) , Q = ( Y 2lq n + 1 > ■ • ■ > 

F (2i+l)g„)' D * = ( y (2i+l)gn+l'-' F (W2)g n )- Then ' since A < and A* have the 
same distribution and the sequences (ei) and (Xi) are independent, C/ and 
C* have the same distribution. Moreover the construction of via Berbee's 
coupling Lemma implies that C ; * and Cp are independent if Z =/= V . At last 

Picket) <P qn - 

Now we split Z {2) into two terms: Z n 2) (T)t n = (l/2)Z (2X ] {T)+(1/2)Z (2X \T) 
where 

p n -l (2/+l)<?„ 



Zp)(T) = -^l £ QTU m (Y*)-nQTn m (Y*)] 

Pn l=Q Qn i=2lqn+ i 
Pn-l (2i+2) 9 „ 

Z (2, 2)(T) = £ £ QnuO?) -E[Q T n m (y;)] 
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Then we apply Talagrand's inequality to Zn (T). 

• Let us first compute M\. We have to bound ||QTn m ||oo for T in Bf(m, ml). 
By linearity of Q 

Qrn m (x) = E a J feZ E b J , wQ'Pjk<p j iki®<Piiv j ii>( x ) 
jkl j'k'V 

Then, since Q s ®t{%) — v s (x) Jt(y)dy, using the Schwarz inequality 



\Qru m (x)\ 2 < ^a^wElV^^ 1 ) / 

< /o^linil^K^Wi 2 

jkk'l 

since the tpji are orthonormal. The property P6 gives then 



<Pjl<pj'l' 



IQmJlL^/o^linil^iE^) 27 ^' 

3 = J 

1/2, 



so that (using Lemma H again) M x = / i/ lL T || Ax /$i2 2 T+ 4 /(2 2 'r+ 4 - 
• Now, we compute H 2 . For T € Bf(m, m'), 

l4 2,1) (r)i 2 < 5>?« E l^ 2,1) fo* ® ^)i 2 



-)7+2 



Thus 



E 



sup Z^ ,x \T) 

TeB f (m,m') 



</ - x X;Var 



in 

p„-l 1 (2Z+1) 9 „ 

r 51 7" X! ( 3(vj fc «'¥'j!)n m (^*) 



Pn ;=o 9 " i=2Z 9n +i 



The variables (C*) are independent and identically distributed so 



E 



sup Z^\Tf 

T&B f (m,m') 



</o _1 Ef Var f 



jfei ™ L 1 " i=l 
However, on f2, Cx and have the same distribution, so that 



Var 



1 in 

— E Q{<P j k®<Pji)n m : 7) 
" n .'—1 



Var 



1 9» 



i=l 



And, coming back to the definition of Qt, for ii 7^ 12, 



4tt 2 

1 
4^2 



E(e 



g*(-u) 

(e iX4 i"e- iX *^)[(^ fe <g> ^i)nm]*(«,0)[(^* ® ^ JJ )II ro ]*(-»,0)dud» 
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since E(e l£ ^ u e- te ^ v ) = q*(-u)q*{v). Now using (fTD|) . 

= cov ( / (fjk <8 tp j i)U m (X n ,y)dy, / (^ fc tp j i)U m (X i2 ,y)dy) 



It implies that 



Var 



+Var 



^S Var ^(w*®w)n„( y i)] 
1 Hn f 

— E (Vjk ® ¥>ji)Tl m {Xi,y)dy 

hi - , J 



And then 
E 



sup ^(T) 

T£Bf(m,m') 



^ fo 1 E rVvar[Q fefc ^. i)nm (ri)] 

i r i q,z r 

+ /o 1 E — Var — E I i i Pjk® l Pji) Tl m{X ll y)dy 



For the second term in (|17|) . we use Lemma [7] to write 
" 1 q " f 

— E (<Pjk® <Pji)Tlm(Xi,y)dy 
_q n . =1 J 

II El/ ( ( Pjk®ip j i)U m (. 1 y)dy\ 2 \\, 



E Var 



For all real a; 



Therefore 



/ {fjk ® l Pji)' n -m(x,y)dy = ^ bjk'ifjk^Pjk'(x) 

k> 



El / (^i fe ® ^/)n m (a;,y)d2/| 2 < ||II|Q \<p jk cp jk ,( 

jkl jkk'l 



X )\ 2 < \\uf A ^ m ,. 



using property P3. Then 



^Var 

jkl 



< 



Thus we have bound the second term in JTTJ) by 2/ _1 £ fc /3fe||II||^$?D^„/ 
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For the first term in (fl7|), we bound J^jki E [IQ(v Jfc «>^ i )n m ( y i)| 2 ] : 

T,n\Q( VjkM ujY 1 )\ 2 } < e^EEi f PiwrfuKuw&itf 

jkl j'k'V jkl j'k'l' 

< ||n||iE 2J Elw^i)l 2 

jkk' 

But ¥.\v VjkV . k , (Yi) | 2 = J \Vtp ih tp. h , (x)\ 2 p(x)dx where p is the density ofYi. Since 
P = 9* /> 1^(^)1 < Iklloo for all x. Then 



and 



EK 3 ^ Jfc ,(n)r < Iklloo / K DkV]k ,{x)\ <ix 



r 

(n)| 2 ] < ||n||i||g|UE2^E / K 3kVjk ,(x)\ 2 dx 
jki j=j kk> J 



^liniliNlooiiE^^^WlslU*! 



927+3 

1 r)2 7 +3 

227+3 _ ^ to" ' 



applying Property P7 and Lemma [4] We finally obtain 



E 



sup Z^\Tf 

T£Bf(m,m') 



< 2/ - 1 ||n||i||g|| oo $ 1 



227+3 £) 2 T+ 3 



2/ - 1 EAII n lli $2 



227+3 _ 1 n 



Since the order of nH 2 has to be larger than the one of v, we choose 



H 2 = 2/ - 1 ||n||i$ 1 max(|| g || co 2^+ 3 /(2^ +3 -l),<&i) 



D 



27+7/2 



Lastly, using Lemma [7] again 

(2Z+1) 9 . 



(2/+l)?„ 



Var(i E QTn m (^*))=Var(i £ Q|n m (^)) 



i=2ig„+l 



i=2ig n + l 



< -E[|Q T n m | 2 (yi)6(F 1 )] < -||g Tnm || 00 (E[|Q T n m | 2 (ri)]) 1/2 (E[6 2 (Y 1 )]) 1 / 2 
9« g« 



V2E fc (fc + l)/3 fc , |n 



; (E[|Q T n m | 2 (Fi)]) 1/2 



(18) 



We have already proved that HQthJU < / " 1/2 ||n|U v /$ 1 2 2 7+4/(2 2 7+4 _ 1)D^+ 2 . 
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Now we need a sharp bound on E[|QTn m 0'i)| 2 ]- We have 



E[|QTn m (li)| 2 ] < H^Hoo / |QrnJ 



Ikll 



2tt 

Then it follows from the Schwarz inequality that 



(Tn m )*(M,o) 



q*(-u) 



2 



du 



|2l ^ \\q\U [\{TIV m Y(uM 2 



n\QTn m {Yx)\ 2 ] < j \f(-u)\* du ]J J i( Tn m)*M)i 2du 

We will evaluate the two terms under the square roots. First observe that 

(Tn m )*(u,o) = X X a 3kibfk'v{ i ^jk i ~P]'k')* {u){ipjnp rv y {q) 

jkl j'k'V 

jkk' I 

since (<Pji<Pj>i>)*(0) = J tpji<ffi> = tj=j>,i=i>. Then 



\(TU m y(u,0)\ 2 du< / Kfjk^jk'Tiu^du 

jkk' l jkk'r 

<2^/ - 1 ||n|| 2 4 ^ J \(^k^k')(u)\ 2 du 

jkk' I 

j k' k=k'-2N+2"' 

by taking into account the superposition of the supports. Using now {SJ 
J\(TU m y(u,0)\ 2 du < 2^/ - 1 ||IT|| 2 1 ^C'(^(4iV-3) 



2 



< 2^/ - 1 ||n|| 2 1 $ 1 (4iV-3)- J D, 



2 



Now 



3 



J k*(-«)l 4 ~ hi, 3 3 h?n) 



jkk' I jkk' I ' 



</ iniu^ / 23 \f(-vv)\* dv 



jkk' I 

Hence, inequality (J5J) and Assumption HI show that 

\(Tn m y(u,o)f du 



\q*(-u)\* 

< fo'Ml E / 23 °r [l«! 2(1 " r) l|,|>i + lM<i] koHiVv) 2 + If^dv 



jkk' I * 



jkk' l 
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with C = J [H 2(1 - r) l M> i + i|«|<i] (v 2 + \) 2l dv < oo as soon as r > 27 + 3/2. 
Then 



|(rn m )>,o)p 

|g*(-«)|* 



m k'+2N-2 



du < f -'\\ii\\ 2 A c 2 r k^cj2Yl E ( 2J ) 



4 7 +l 



< / - 1 ||n||iC r 2 fe - 4 C3(4Ar-3) 



j=J k'l k=k'-2N+2 
247+3 



247+3 _ 1 



D 



47+3 



Finally 



n\Q T n m (Yi)\ 2 ] < ^/o- 1 ||n||^(4^-3)afco-yC^S^^i^ 74: ' 2 



Then (T8J) gives 



/2^(fc + l)/3 fe || (Z || oo / ( r 1 ||n||ifc - 1 C(7,r,7V,<f ] 



27+13/4 



3n 



Then replacing n by p n in inequality (|19p gives 

E[ sup \Z^\T)\ 2 6H 2 } + < C ( ^e- k ^ + ) 

TeB f (m,m') 



( n 2 7 +13/4 
< (J' I — — " ^ 



■)27+4 2 -k£ 



2 Z r,l/4 



where C" and fc£ depend on r,N,j, $i,/ , ||II||a, IMU, Lfc( fc + l )Pk and 53 A /3fc. 
But there exists a positive constant if such that 



E 



27+13/4 -k^D 1 '?, 



< K 



Moreover D^t < n 1 / 8 and g n < n c with c + 1/8 < 1/2, which implies 



E E 



m'eM r , 



sup (^(r)! 2 -^^!! 

T£B f (m,m') 



D 



27+7/2 



< 



C" 



with Jf 2 = 12/ - 1 $imax(|| 9 || oo 2 2 ''+ 3 /(2 27+3 - l),$i). Thus, if p 2 (m,m') = 
y>2 ( m > m ') + Pa (m, m') with p^ 1 (n, m') = lC2||n|| 2 i I} 2 ^ , ~ 7/ /n and 
p( 2) (m,m') = ^ 2 ||n|| 2 (Efc/?*)^«/n. thm 



E E 



sup Z( 2 )(T) 2 -p 2 (m,m') 



In < 



C 2 
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6.8. Technical Lemmas 
Lemma 4. For all m > J 



E( 2J ) 

3= J 



P < 



2P 



2P-V 



-D p 



Proof of Lemma^ It is sufficient to write 



JUL op{m+l) _ opj 9P OP 
^{V) P = : < — r2 pm < — tD<:„ 

3 = J 



2P-1 



2P - 1 



2P-1 



Lemma 5. If \tp*(x)\ < k 3 (x 2 + 1) T l 2 for all real x then 
• if s and a are reals such that sr > a + 1 



ifr > 1 



\<P*(X)\ S {X 2 + l) a ^ 2 dx < C S , Q < OO 



\ip*(y)ip*(x - y)\dy < C r (\x\ X r l| x |>i + t\ x \<i) 



Proof of Lemma\Bi 

• For the first point, it is sufficient to observe that the function (x 2 
l)(->"s+a)/2 ig i n tegrable if -rs + a > -1. 

• By changing the variable (y — xu), we get 



(y)if* (x - y)\dy = / \<p*(xu)(p* (x(l - u))\xdu 



< I ks\xu\ r k3\x(l — u)\ r \x\du 

' | | > 1/3 and |1-m|>1/3 



+ / -3 

'|«|<l/3 



k 2 \x(l~u)\ r \x\du + 



|1— u|<l/3 



/c 2 |:eu| r |a;|du 



2ori |l-2r 



\<p*(y)ip*(x-y)\dy < kp r \x\ 



-k^lxl 1 -" 



< k 2 z 



2 
3 

2.3 2 ^" 1 , 



M>l/3 \ u 
3 
2 



r - 1 



x 



l-2r +2 2-r 3 r-l| a ,|l-r 



Thus, if \x\ > 1, f\<ff(y)<p*(x-y)\dy< C^x] 1 ^ and if \x\ < 1, / \tp* {y)y* (x - 
y)\dy < C r with C r = fc 2 (2.3 2r ~7 (r - 1) + 2 2 - r 3'- 1 ). 
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Lemma 6. Let T±, . . . , T n be independent random variables and 

n 

i/„(r) = (l/n)£[r(T 4 )-E(r(T i )] > 

i=l 

for r belonging to a countable class 1Z of measurable functions. Then, for e > 0, 
E[sup K(r)| 2 - 6H 2 ] + < C (-e~ k ^ + Ml e ~^) (19) 
with k\ = 1/6, fca = l/(21\/2) and C a universal constant and where 

sup IMloo < Mi, E ( sup \v n (r)\) < H, sup -VVarfrfr,)) < w. 
reR \reiz J ren n i=1 

Usual density arguments allow using this result with non-countable class of 
functions 1Z. 

Proof of Lemma El - We apply the Talagrand concentration inequality given 
in lKlein and Rio! (|2005l ) to the functions s % (x) — r(x) — E(r(Tj)) and we obtain 

P(sup|i/„(r)| >2T + A)<exp -- 



Ye& V 2(v + 4HM 1 ) +6M1A 



Then we modify this inequality following iBirge and Massartl (|1998l ) Corollary 2 
p. 354. It gives 

„/ , m „ . > ( n /A 2 rninfn, 1)A 

P sup \u n r > 1 + + A < exp --min -, 7 

To find inequality (0 we use the formula E[X] + = P(X > t)dt with 
X = su Pren \ Vn (r)\ 2 -6H 2 . 

Lemma 7. i Viennel 1 199$ ) ) Let (Tj) a strictly stationary process with (3-mixing 



coefficients (3k ■ Then there exists a function b such that 

E[6(Ti)] <J2$k and E[6 2 (Ti)] < 2^(A; + l)/3 fc 
k k 

and for all function tp (such that E[?/; 2 (Ti)] < 00) and for all N 

N 

Var£>(Ti)) < 4AHE[|V>| 2 (T 1 )6(T 1 )]. 
In particular, for functions (V^a), Ea Var (Eti ^(T,)) < 4iV(^ fe /3 fc )|| £ A |^a| 2 ||c 
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